CART variance stabilization and regularization for high-throughput genomic data
نویسندگان
چکیده
MOTIVATION mRNA expression data obtained from high-throughput DNA microarrays exhibit strong departures from homogeneity of variances. Often a complex relationship between mean expression value and variance is seen. Variance stabilization of such data is crucial for many types of statistical analyses, while regularization of variances (pooling of information) can greatly improve overall accuracy of test statistics. RESULTS A Classification and Regression Tree (CART) procedure is introduced for variance stabilization as well as regularization. The CART procedure adaptively clusters genes by variances. Using both local and cluster wide information leads to improved estimation of population variances which improves test statistics. Whereas making use of cluster wide information allows for variance stabilization of data. AVAILABILITY Sufficient details for our CART procedure are given so that the interested reader can program the method for themselves. The algorithm is also accessible within the Java software package BAMarray(TM), which is freely available to non-commercial users at www.bamarray.com. CONTACT [email protected].
منابع مشابه
Cyber-T web server: differential analysis of high-throughput data
The Bayesian regularization method for high-throughput differential analysis, described in Baldi and Long (A Bayesian framework for the analysis of microarray expression data: regularized t-test and statistical inferences of gene changes. Bioinformatics 2001: 17: 509-519) and implemented in the Cyber-T web server, is one of the most widely validated. Cyber-T implements a t-test using a Bayesian...
متن کاملStatistical Learning Methods for High Dimensional Genomic Data Statistical Learning Methods for High Dimensional Genomic Data Title: Statistical Learning Methods for High Dimensional Genomic Data
Due to their high-dimensionality, -omics technologies require the development of computational methods that are able to work with large number of variables. Each data type is characterized by its method of measurement and by the biological aspect under study. Understanding the data properties allows the design of sophisticated and effective computational models that are able to uncover and expl...
متن کاملNootropic Medicinal Plants; Evaluating Potent Formulation By Novelestic High throughput Pharmacological Screening (HTPS) Method
The principle of this method was to screen the pharmacological activity of six prepared polyphyto formulations by using high throughput screening method for their nootropic action. The study was performed in three stages using one, two and three animals, respectively in a group. Test formulations were given p.o daily at the dose of 50 and 100 mg/kg body weight. The test formulations were compar...
متن کاملglmgraph: an R package for variable selection and predictive modeling of structured genomic data
UNLABELLED One central theme of modern high-throughput genomic data analysis is to identify relevant genomic features as well as build up a predictive model based on selected features for various tasks such as personalized medicine. Correlating the large number of 'omics' features with a certain phenotype is particularly challenging due to small sample size (n) and high dimensionality (p). To a...
متن کاملVariance of the number of false discoveries
In high-throughput genomic work, a very large number d of hypotheses are tested based on n d data samples. The large number of tests necessitates an adjustment for false discoveries in which a true null hypothesis was rejected. The expected number of false discoveries is easy to obtain. Dependencies among the hypothesis tests greatly affect the variance of the number of false discoveries. Assum...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Bioinformatics
دوره 22 18 شماره
صفحات -
تاریخ انتشار 2006